Acquisition of Phraseological Units from Linguistically Interpreted Corpora a Case Study on German Pp-verb Collocations

نویسنده

  • Brigitte Krenn
چکیده

In this paper, we show that accessibility of syntactic information eases collocation extraction from corpora, and supports identi cation of lexical and structural restrictions related to collocations. For collocation identi cation we use a corpus that is automatically annotated applying a part-of-speech tagger and a phrase chunker.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extraction of V-N-Collocations from Text Corpora: A Feasibility Study for German

The usefulness of a statistical approach suggested by Church and Hanks (1989) is evaluated for the extraction of verb-noun (V-N) collocations from German text corpora. Some motivations for the extraction of V-N collocations from corpora are given and a couple of differences concerning the German language are mentioned that have implications on the applicability of extraction methods developed f...

متن کامل

CDB - A Database of Lexical Collocations

CDB is a relational database designed for the particular needs of representing lexical collocations. The relational model is defined such that competence-based descriptions of collocations (the competence base) and actually occurring collocation examples extracted from text corpora (the example base) complete each other. In the paper, the relational model is described and examples for the repre...

متن کامل

Experiments on Candidate Data for Collocation Extraction

The paper describes ongoing work on the evaluation of methods for extracting collocation candidates from large text corpora. Our research is based on a German treebank corpus used as gold standard. Results are available for adjective+noun pairs, which proved to be a comparatively easy extraction task. We plan to extend the evaluation to other types of collocations (e.g., PP+verb pairs).

متن کامل

Using chunked corpora for the acquisition of collocations and idiomatic expressions

This paper1 discusses the use of recursive chunking of large German corpora (over 300 million words) for the identification and partial classification of significant lexical cooccurrences of adjectives and verbs. The goal is to provide a fine-grained syntactic classification of the data at the levels of subcategorization and scrambling. We analyze the combinatory preferences of adjectives with ...

متن کامل

Towards a corpus-based dictionary of German noun-verb collocations

We 1 describe our attempts to automatically extract raw material for a dictionary of German noun-verb collocations from large corpora of newspaper text. Such a dictionary should be about collocations and it should include a description of their linguistic properties, rather than listing the mere lexical cooccurrence. Since most statistical collocation nding tools do not provide other than lexic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998